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Abstract 

We consider spectrally-efficient communication over a Rayleigh iV-block-fading channel with a K- 
sparse L-length discrete-time impulse response (for < A' < L < A^), where neither the transmitter nor 
receiver know the channel's coefficients nor its support. Since the high-SNR ergodic capacity of this 
channel has been shown to obey C(SNR) = (1—K/N) log2(SNR) + 0(1), any pilot-aided scheme that 
sacrifices more than K dimensions per fading block to pilots will be spectrally inefficient. This causes 
concern about the conventional "compressed channel sensing" approach, which uses polylog(L)) 
pilots. In this paper, we demonstrate that practical spectrally-efficient communication is indeed possible. 
For this, we propose a novel belief-propagation-based reception scheme to use with a standard bit- 
interleaved coded orthogonal frequency division multiplexing (OFDM) transmitter. In particular, we 
leverage the "relaxed belief propagation" methodology, which allows us to perform joint sparse-channel 
estimation and data decoding with only 0{LN) complexity. Empirical results show that our receiver 
achieves the desired capacity pre-log factor of 1 — K/N and performs near genie-aided bounds at both 
low and high SNR. 
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I. Introduction 

Our goal is to communicate, in a spectrally efficient manner, over a Rayleigh A^-block-fading channel 
with a i^-sparse discrete-time impulse response of length L (where Q < K < L < N), under the realistic 
assumption that neither the transmitter nor the receiver knows the channel's coefficients nor its support. 
It has been recently shown ||T| that the ergodic capacity of this noncoherent sparse channel obeys 

Csparse(SNR) = log2(SNR) + 0{l) (1) 

as the signal-to-noise ratio (SNR) grows large. For comparison, the high-SNR ergodic noncoherent 
capacity of the Rayleigh A^-block-fading L-length non-sparse channel obeys Q 

C'non-sparse 

(SNR) = log2(SNR) + (2) 

which exhibits a lower pre-log factor than ([T]). Thus, information theory confirms that channel sparsity 
can indeed be exploited to increase spectral efficiency, at least for high SNR. In particular, it establishes 
that, in the high-SNR regime, the signaling scheme does not need to sacrifice more than K degrees-of- 
freedom per fading-block to mitigate the effects of not knowing the K non-zero channel coefficients nor 
their locations. 

Among the many strategies that exist for communication over unknown channels, pilot-aided trans- 
mission (PAT) lO has emerged as one of the most effective. For example, it is known |[2l that, for 
the Rayleigh A^-block-fading L-length non-sparse channel, PAT achieves rates in accordance with the 
capacity expression It is then not surprising that the vast majority of techniques that have recently 
been proposed for communication over sparse channels are also based on PAT (see, e.g., the extensive 
bibliography in lH). Broadly speaking, these techniques propose to exploit channel sparsity in order to 
reduce the number of pilots used for accurate channel estimation, with the end goal of increasing spectral 
efficiency. Typically, these schemes take a decoupled approach to reception: a sparse-channel estimate 
is calculated from pilot observations using a practical compressed sensing algorithm like LASSO 0, 
|[6l . and the channel estimate is subsequently used for data decoding. Hereafter, we shall refer to this 
decoupled approach as "compressed channel sensing" (CCS), after BJ. When ©(K polylog(L)) pilotj^ 
are used for CCS, the theory of compressed sensing guarantees that — with high probability — the resulting 

' Note that (TJ and ^ specify only that the maximum rate of reliable communication grows in linear proportion to log (SNR) 
according to the specified pre-log factor; the exact value of the capacity remains unspecified due to the 0(1) term. 

^ The use of ©(/f polylog(I/)) pilots corresponds to the case of OFDM-based transmission, which is the case that we focus 
on later in this paper. 
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channel estimates will be accurate, e.g., their squared-error will decrease in proportion to the received 
noise variance H. 

While the use of 0(i('polylog(L)) pilots may be an improvement over L pilots required when channel 
sparsity is not taken into account, the capacity expression ([T]) implies that any PAT scheme sacrificing 
more than K degrees of freedom (per fading block) to pilots will be spectrally inefficient in the high-SNR 
regime. Thus, any scheme based on CCS, which uses ©(i^ polylog(L)) > K pilots, will fall short of 
maximizing spectral efficiency. One may then wonder whether there exists a practical^ communication 
scheme that achieves the capacity prelog factor in ([1]). 

In this paper, we propose a novel approach to communication over sparse channels that (empirically) 
achieves rates in accordance with the sparse-channel capacity expression ([T]). For transmission, we use 
a conventional scheme, based on bit-interleaved coded modulation (BICM) with orthogonal frequency 
division multiplexing (OFDM) and a few carefully placed training bits. For reception, we deviate from the 
CCS approach and perform sparse-channel estimation and data decoding jointly. To accomplish this latter 
task in a practical manner, we take an approach suggested by belief-propagation (BP) Q, leveraging 
recent advances in "relaxed BP" HI, S and in BP-based soft-input/soft-output (SISO) decoding HOl. The 
scheme that we propose has very low computation complexity: only 0{NL) multiplies per fading block 
are required. Thus, we are able to handle long channels, many subcarriers, and large QAM constellations 
(which are in fact necessary to achieve high spectral efficiencies). Our simulations, for example, use 
A^ = 1021 subcarriers, up to 256-point QAM constellations, « 10000-bit LDPC codes, and channels with 
length L = 256 and average sparsity E{A'} = 64. Under these conditions, we find that our scheme yields 
error rates that are close to genie-aided bounds, and far superior to CCS, in both low- and high-SNR 
regimes. Moreover, we find that the outage rate behavior of our scheme coincides with the sparse-channel 
capacity expression ([T]). 

We will now place our work in context. The basic idea of using BP for joint channel-estimation and 
decoding (JCED) has been around for more than a decade (see, e.g., the early overview lITTI and the 
more recent works |[T2ll . |[T3l ). The standard rules of BP specify that messages are passed among nodes 
of the factor graph according to the sum-product algorithm (SPA) Q. However, since in many cases 
exact implementation of SPA on the JCED factor graph is impractical, SPA must be approximated, and 
there is considerable freedom as to how this can be done. In fact, many well known iterative estimation 

^ In Ul, a scheme that achieves the prelog factor in ([TJ was proposed, but it is impractical in the sense that its complexity 
grows exponentially with the fading-block length A*'. 
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algorithms can be recognized as particular approximations of SPA-BP: the expectation-maximization 
(EM) algorithm |[T4l . particle filtering lITSl . variational Bayes (or "mean-field") |[T6l . and even steepest 
descent ifTTl . Not surprisingly, this plurality of possibilities has yielded numerous BP-based JCED designs 
for frequency-selective channels (e.g., HH, IH, 1201, ll2ll ). 

Our work is distinct from the existing BP-based JCED literature in that 1) we model the channel as 
apriori sparse (i.e., the coefficients are non-Gaussian) whereas, in all of the existing BP-based JCED work 
that we are aware, the channel coefficient^ are modeled as Gaussian, and 2) we leverage a state-of-the- 
art BP approximation known as "relaxed BP" (RBP), which has been rigorously analyzed and shown to 
yield asymptotically exact posteriors (as the problem dimensions N,L ^ oo and under certain technical 
assumptions on the mixing matrix) fSl, In fact, we conjecture that the success of our method is due 
in large part to the principled approximations used within RBP. We also note that, although we focus on 
the case of sparse channels, our approach would be applicable to non-sparse channels or, e.g., non-sparse 
channels with unknown length iTTSl . with minor modification of the assumed channel priors. 

Our paper is organized as follows. In Section |ll] we detail the system model, and in Section |lll] we 
detail our RBP-based JCED approach. In Section |IV] we report the results of our simulation study, and 
in Section |V] we conclude. 

II. System Model 

We assume an OFDM-based transmitter that uses a total of N subcarriers, each modulated by a QAM 
symbol from a 2^^-ary unit-energy constellation S. Of these subcarriers, A'^p are dedicated as pilotsjfl 
and the remaining N(i = N — Np are used to transmit a total of Mf training bits and = N^M — Mx 
coded/interleaved data bits. To generate the latter, we encode M\ information bits using a rate-i? coder, 
interleave them, and partition the resulting Mq = M\/R bits among an integer number T = M^/M^ 
of OFDM symbols. The resulting scheme has a spectral efficiency of 77 = M^R/N information bits 
per channel use (bpcu). It should be emphasized that our model supports both subcarriers whose QAM 
symbols are completely known to the receiver ("pilot subcarriers"), as well as subcarriers whose QAM 
symbols are only partially known to the receiver (via "training bits"). In our nomenclature, the known 
bits that make up a "pilot subcarrier" are distinct from the "training bits" that may be sprinkled among 
the "data subcarriers." 

After submitting this manuscript, we became aware of the related work 1221 . which applies BP to JCED for flat-fading 
Gaussian channel coefficients and non-Gaussian interference. 

' For our BP-based JCED, we will see in Section |IV] that {Np,Mx) = {Q,MK) is most effective. 
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In the sequel, we use s^'^^ € S for /c € {1, ... , 2^^} to denote the A;*'* element of the QAM constellation, 
and c'^'^) = {c{ \ ■ ■ ■ , c^m^ ^ {^i ^}^^ ^'^ denote the bits corresponding to s^'^^ as defined by the symbol 
mapping. Likewise, we use Si[t\ G § to denote the QAM symbol transmitted on the i^^ subcarrier of 
the t*'' OFDM symbol and Ci[t] = (cj,i[t], . . . , c,j,j\/[t])"'" G {0, 1}^^ to denote the (coded/interleaved or 
training or pilot) bits corresponding to that symbol. We then collect the NM bits that make up the t^^ 
OFDM symbol into c[t] = (co[t], . . . , C7v-i[t])^, and we collect the NMT bits that make up the entire 
(interleaved) codeword into c= (c[l], . . . , c[T])^ € {0, l}^^^^. The elements of c that are known apriori 
as pilot or training bits will be referred to as Cpt. The remainder of c is determined from the information 
bits b={bi, . . . , b]\j.y by coding/interleaving. 

We use the standard OFDM model (see, e.g., ll23l ) for the received value on subcarrier i of OFDM- 
symbol t: 

yi[t] = s^[t]zi[t] + v^[t], (3) 

where Zi[t] E C denotes the i*'* subcarrier's gain and denotes circular white Gaussian noise 

with variance As usual, the subcarrier gains z[t] = {zo[t], . . . , zj\f^i[t]y are related to the baud- 
spaced channel impulse response vector x[t] = {xo[t], . . . ,XL-i[t]y via Zi[t\ = "^fZo ^ijXj[t], where 
= e"^'^""*-' can be recognized as the {i,jY^ element of the A^-DFT matrix Throughout, we will 
use j to index the lag of the impulse response. We assume that the channel is block-fading with fading 
interval N, so that the vectors {a;[t]}^^ are i.i.d. across t. To simplify our development of the algorithm, 
we assume in the sequel that T = 1 and drop the "[t]" notation for brevity. However, for the simulations 
in Section |lVl we revert back to general T in order to facilitate the use of long LDPC codewords. 

As described in Section IH the focus of the paper is on block-Rayleigh-fading channels with sparse 
impulse responses {xj}. To model sparsity, we treat the impulse response coefficients as random variables 
{Xj} with the independent Bemoulh-Gaussian prior pdf. 

px, (x) = \fM{x- 0, /ij ) + (1 - \j)5{x), (4) 

where CM [x] a, h) = {T^h)~^ exp(— — ap) denotes the complex-Gaussian pdf, 5{-) the Dirac delta, 
Xj =Pr{Xj 7^0} the sparsity rate, and Hj = \ai{Xj} the variance. We furthermore assume that the channel 
is energy-preserving with an exponential delay-power profile, so that /^j = 2^-'/^*'pV(X]r=o^ ^2'"^/^""^), 
where L^pd denotes the half-power delay. For simplicity, we assume a uniform sparsity rate of A = Xj Vj. 

The presence of a Dirac delta in @ indicates that we assume an "perfectly sparse" channel model. 
Although perfect sparsity is not expected to manifest in practice, it is frequently assumed in the literature 
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Fig. 1. The factor graph of the JCED problem for a toy example with A'^ = 4 OFDM subcarriers, M — 2 bits per QAM 
symbol, M\ — 3 iirformation bits, Mt = 2 training bits, A'^p = 1 pilot subcarriers (at subcarrier index i = 3), and a channel with 
length 1/ = 3. 



(see, e.g., lH and the papers cited therein). While the JCED algorithm proposed in Section IIII-CI can 
handle generic marginal priors px^ (x), we make the perfect sparsity assumption only to facilitate a direct 
comparison to the information theoretic result ([T]) from (Tl. In follow-on work |[24l . |[25l . we consider 
channel taps that are both clustered and non-perfectly sparse, as motivated by the IEEE 802.15.4a model 
in combination with raised-cosine pulse shapes. 

III. BP-BASED Joint Channel Estimation and Decoding 

Our goal is to infer the information bits b, given the OFDM observations y = {i/q, . . . ,yjv„i)^ and 
the pilot/training bits Cpt, in the absence of channel state information. For simplicity, we assume that the 
channel statistics (i.e., {fj,^ , A, Lhpd, -Z^}) are known ^ In particular, we aim to maximize the posterior pmf 
p{bm I y, Cpt) of each information bit bm- Given the model of Section|IIl this posterior can be decomposed 
into a product of factors as follows: 

p{bm\y,Cpx) = ^p{b\y,Cpt) (X ^p{y \b,Cpi)p{b) (5) 



(6) 



-/EEEM«k.*(xW.|cWc|6,cW6) 

„ L-1 N~l Mi 

= / '[[p{xj)'^'[[p{yi\si,x)^p{si\ci)^p{c\b,Cpi) Y[p{bm), (7) 

■^^ j=0 s 1=0 c b\„, m=l 

* Although it remains outside the scope of this work, it should be possible to jointly estimate these statistics together with 
the channel and data realizations by treating them as random variables with appropriate non-informative priors and expanding 
the factor graph accordingly. 
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Fig. 2. Examples of belief propagation among nodes of a factor grapli. 

where "oc" denotes equality up to a scaling and where denotes the vector b with the m^^ element 
omitted. The factorization ([7]) is illustrated by the factor graph in Fig. \T\ where the round nodes represent 
random variables and the square nodes represent the factors of the posterior identified in 

A. Background on belief propagation 

While exact evaluation of the posteriors {p{bm \ y , Cpi)}^^^ is computationally impractical for the 
problem sizes of interest, these posteriors can be approximately evaluated using belief propagation (BP) 
||7l on a loopy factor graph like the illustrated in Fig. [T] In standard BP, beliefs take the form of pdfs/pmfs 
that are propagated among nodes of the factor graph according to the rules of the sum-product algorithm 
(SPA): 

1) If factor node /(ui, . . . iVa) is connected to variable nodes {va}a=i^ theri the belief passed from 
/ to Vb is 

Pf^y^{Vb) (X / fivi,...,VA)Y[Pva^fi'"a), (8) 

where {p^ are the beliefs most recently passed to / from {va}ay^b- 

2) If variable node v is connected to factor nodes {/i, . . . , Jb}, then the belief passed from v to fa 
is 

Pv^fAv) oc JJp/,^^,(u), (9) 

where {pf^_^^[-)}i,^a are the beUefs most recently passed to v from {fb}by^a- 

3) If variable node v is connected to factor nodes {/i, . . . , fs}, then the posterior on v is the product 
of all most recently arriving beliefs, i.e., 

B 

p{v) cx llpf,^v{v)- (10) 

6=1 

Figure |2] is provided to illustrate the first two rules. 

When the factor graph contains no loops, SPA-BP yields exact posteriors after only two rounds of 
message passing (i.e., forward and backward passes). However, with loops in the factor graph, convergence 
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to the exact posteriors is not guaranteed, as exact inference is known to be NP-hard Il26l . That said, 
there exist many problems to which loopy BP has been successfully applied, including inference on 
Markov random fields 1271, multiuser detection lH, m, turbo decoding ||29l, LDPC decoding HOl, and 
compressed sensing Q, ll30l . IIBTI . Our work not only leverages these past successes, but unites the last 
two through "turbo" message scheduling on a larger factor graph |[32|| . 



B. Background on RBP 

A sub-problem of particular interest to us is the estimation of a non-Gaussian vector x that is 
linearly mixed to form z = and subsequently observed as y through componentwise non-Gaussian 
measurements {pY,\z,{yi\^i)}f=Q^ ■ ^^^^ ® specifies the non-Gaussian prior on x and ^ yields 

the non-Gaussian measurement (where the non-Gaussianity results from the inherent uncertainty on data 
symbols Sj). This sub-problem yields the factor graph shown within the right dashed box in Fig. [T] where 
the nodes "y/' represent the measurements and the rightmost nodes represent the prior on x. 

Building on prior multiuser detection work by Guo and Wang jSl, Rangan recently proposed a so-called 
"relaxed BP" (RBP) scheme IH that yields asymptotically exact posteriors as A^, L — )• oo (under some 
additional technical conditions on The main ideas behind RBP are the following. First, although 

the beliefs flowing leftward from the nodes {xj} are clearly non-Gaussian, the corresponding belief 
about Zi = YIi^Zq ^ijXj can be accurately approximated as Gaussian, when L is large, using the central 
limit theorem. Moreover, to calculate the parameters of this distribution (i.e., its mean and variance), 
only the mean and variance of each Xj are needed. Thus, it suffices to pass only means and variances 
leftward from each Xj node. It is similarly desirable to pass only means and variances rightward from each 
measurement node. Although the exact rightward flowing beliefs would be non-Gaussian (due to the non- 
Gaussian assumption on the measurement channels py^i^^), RBP approximates them as Gaussian using 
a 2nd-order Taylor series, and passes only the resulting means and variances. A further simplification 
employed by RBP is to approximate the differences among the outgoing means/variances of each left 
node, and the incoming means/variances of each right node, using Taylor series. The RBP algorithnt] is 
summarized in Table IH Assuming (D1)-(D6) can be calculated efficiently (as is the case in our problem), 
the complexity of RBP is 0{NL). 

^ To be precise, the RBP algorithm in Table |l] is an extension of that proposed in (9). Table U handles complex Gaussian 
distributions and non-identically distributed signal and measurement channels. 
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definitions: 














(Dl) 


= 


Jz' PYilZi (l/l^') CA/'(z';i,/j'=) 


Fout,i{y,z,fj,^) 


— 




(D2) 


£out.i{y,z,fi^) 


= 


X, |z - FouiAv, Pz,\Y,{z\y; z, fi^ 


) (D3) 


PQj{<i;q,tJ''') 






(D4) 




Px/9')CAr(9';9,M9) 




= 




(D5) 




= 




(D6) 


initialize: 








Vi, j : Xij[l] 






ai) 










forn= 1,2,3,... 








Vi : Mi VA 







(Rl) 


Vi : Zi\n] 


= 


Et=o^ 'I'ijiijW 


(R2) 


Vi, J : Zij[n] 


= 


ii[n] — <I>ija;ij[n] 


(R3) 


Vi : Mi \n] 




£out,i(yi,2iN,Mi M) 


(R4) 


: eij[n] 


= 


-Fout,i(yi, Mi N) - W 
- ^ijX,j[n]fj.t[n]/fi![n] 


(R5) 


Vi : /i" 


= 


(1 - Mi W/mI N)~Vf W 


(R6) 


Vi, j : itijM 


= 


(l-M,W/MiW) ^eij[n] 


(R7) 


Vj:/i|N 






(R8) 


Vj : Qj [n] 






(R9) 


Vj : mJ[" + 1] 




^inj(gjW,MlM) 


(RIO) 


Vj :i, [n + l] 




-Fin,j(gjN,MlM) 


(Rll) 


Vi, j : iij[n + l] 






(R12) 


end 









TABLE I 

The RBP Algorithm 



C. BP-based joint channel estimation and decoding 

In this section, we detail our BP-based approach to JCED, frequently referring to the factor graph in 
Fig. [T] Note that, since our factor graph is loopy, there exists considerable freedom in the message passing 
schedule. We choose to propagate beliefs from the left to the right and back again, several times, stopping 
as soon the beliefs have appeared to converge. Each full cycle of message passing on the overall factor 
graph will be referred to as a "turbo iteration." During each turbo iteration, several rounds of message 
passing are performed within each of the dashed boxes in Fig. [T] We refer to the iterations within the left 
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dashed box as "SISO decoder iterations" and the iterations within the right dashed as "RBP iterations." 
Below, we provide details on how beliefs are calculated and propagated. 

At the very start, nothing is known about the information bits, which are assumed apriori to be 
equally likely (i.e., Pr{6m = 1} = ^ Vm). Thus, the bit beliefs that initially flow rightward out of the 
coding/interleaving block are uniform (i.e., p^^ = ^ for all indices (i, m) corresponding to data 

bits). Meanwhile, the values of the pilot/training bits are known with certainty, and so pc, „-s>x, (c) = 1 

for C = Cj,m. 

Next, coded-bit beliefs are propagated rightward into the symbol mapping nodes. Since the symbol 
mapping is deterministic, the pdf factors take the form p{s^''^ \ c^'^) = 6k~i, where {Sk}kei- denotes the 
Kronecker delta sequence. According to the SPA, the message passed rightward from symbol mapping 
node ''Mi" takes the form 

M 

PM.^sM''^) ^ p(s(^) |c) [| Pc„„^A^.(Cm) (11) 

c6{0,l}^' m=l 
M 

= IIp-^--^mA&- (12) 

m=l 

The SPA then implies that the same message is passed rightward from node Sj (i.e., p_\4.^s.{s^''^) = 

Recall, from the discussion of RBP, that the belief propagating rightward into the OFDM observation 
node "yj" determines RBP's i*'* measurement pdf pY^\zXy\'^)- Writing this belief as pf'^ = ps^^yX^^^^)^ 
Q implies a Gaussian-mixture channel of the form 

2 A/ 



PYAzXy\z) = Y.firCN{y;s^^h,iJi-), (13) 
fc=i 

From (fT3l) . it can be shown (see Appendix lAl) that the quantities (D2)-(D3) in Table U become 

^ouM(y,^,/"^) = ^ + ei(y,i,;U^) (14) 



for 



eXy,z,fin = Y.^'i%,z,fi^)e^'\y,z,fin. (18) 

k=l 
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The quantities in (fT4b- ([T5] ) can be interpreted as follows. Given the observation yi = y and assuming the 
prior Zi ~ CJ\f{z, ii^) on the subcarrier gain Zi, the quantity Fout,i(y, i, /U^) is the MMSE estimate of Zi, 
£oui,iiy, z, fi^) is its variance, and {S,l''\y,z,^i^)}1'l-^ is the posterior pmf of s,. Likewise, from (|4]|, it 
can be shown (see Appendix |B]| that the quantities (D5)-(D6) in Table |I] take the form 



for 



I^M^) = (19) 
= l7,«,M')r^«^ + (20) 

- 1 + expf-Milf^) (21) 

^,{q,^^'^) ^ ^q (22) 

The quantity i<jnj from ( fT9l ) can be interpreted as the MMSE estimate of the channel tap xj given the 
observations y and the pilots Cpt, and the quantity £\r\j from dlOl ) can be interpreted as its variance. 

Using the quantities derived in ([T4l)-([23]). the RBP algorithm in Table U is iterated until convergence 
is detected. Doing so generates approximately conditional-mean (i.e., nonlinear MMSE) estimates {xj} 
of the sparse-channel impulse-response coefficients {xj}, as well as their conditional variances {fJ-^}, 
based on the observations {yi} and the soft symbol estimates Conveniently, RBP also returns 

(a close approximation to) the conditional-mean estimates {zi} of the subchannel gains {zi}, as well as 
their conditional variances {/if}. 

Before continuing, we discuss some RBP details that are specific to our JCED application. First, we 
notice that the condition [n] < ^if [n] is required to guarantee a positive value of the variance /i" [n] in 
(R6). Intuitively, we might expect that /if [n] < //f [n] , because /if [n] = Souii {yi ■, h [n] , /if [n] ) is a posterior 
variance and /if [n] a prior variance. However, this is not necessarily the case during the first few RBP 
iterations, when the soft channel and symbol estimates may be inaccurate. We remedy this situation by 
clipping /xf[n] at the value 0.99/i^[n], where 0.99 was chosen heuristic ally. Second, due to the DFT 
matrix property |$jjp = l Vi,j, step (Rl) in Table U simplifies to /^f M = /^^ M — X] j=o^ A^J N > (R8) 
simplifies to /ij[n] = /i'?[n] = ( X]£o^ l/^^W) ^ ■ With these simplifications, the complexity of RBP is 
dominated by the computation of the elementwise matrix products ^ijXij and ^*jUij, which must each 
be calculated once per RBP iteration, as well as three other elementwise matrix products in (R5), (R7), 
and (R12). Thus, RBP requires only « 5NL multiplies per iteration. 
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After RBP converges, updated symbol beliefs are passed leftward out of the RBP sub-graph. According 
to the SPA, the belief propagating leftward from the yi node takes the form 



Ps^^yXs) ^ CAf{yi;sz,fi'')CMiz;zi,fi'i) (24) 

J z 

= CAA(yi;s£i,|s|Vl + Ai''), (25) 
where the quantities {zi, iJ.f) play the role of soft channel estimates. The SPA then implies that p_\4^^s, (s) = 

Next, beliefs are passed leftward from each symbol-mapping node A4i to the corresponding bit nodes 
Ci^rn- From the SPA, these beliefs take the form 

21/ 

fc=l c:c„,=c m'^m 

PM.<~sAS' ') T-\ ^^^^ 

= ^—r-^ X PM,^sAs^''^)PM,-.sAs^''^) (27) 

k:Cm =c 

for pairs (z, m) that do not correspond to pilot/training bits. (Since the pilot/training bits are known with 
certainty, there is no need to update their pmfs.) 

Finally, messages are passed leftward into the coding/interleaving block. Doing so is equivalent to 
feeding extrinsic soft bit estimates to a soft-input/soft-output (SISO) deinterleaver/decoder, which treats 
them as priors. Since SISO decoding is a well-studied topic (see, e.g., ifTOl . |[33l ) and high-performance 
implementations are readily available, we will not elaborate on the details here. It suffices to say that, 
once the extrinsic outputs of the SISO decoder have been computed, they are re-interleaved and passed 
rightward from the code/interleave block to begin another round of belief propagation on the overall 
factor graph of Fig. [T] The outer "turbo" iterations then continue until either the decoder detects no bit 
errors, the soft bit estimates have converged, or a maximum number of iterations has elapsed. 

IV. Numerical Results 

In this section, we present numerical results that compare our proposed BP-JCED to the CCS approach 
as well as to several reference schemes that act as performance upper/lower bounds. 
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A. Setup and reference schemes 

The following decoupled channel-estimation and decoding (DCED) procedure was used to implement 
CCS. First, a LASSQj channel estimate x[t\ was generated using pilot-subcarriers. To implement LASSO, 
we used the celebrated SPGLl algorithm |[34l with a genie-optimized tuning parameter^ The frequency- 
domain estimate z[t\ = ^x[t\ was then computed, from which the (genie-aided empirical) variance 
{l^{t\ = \\z[t\ — z[t\\\\/N was calculated. Using the soft channel estimate leftward SPA-BP on 

the factor graph in Fig. [J was performed exactly as described in Section |TlI-C[ ensuring that the LASSO 
outputs were properly combined with SISO decoding. We note that, due to the two genie-aided steps, 
the performance attained by CCS may be somewhat optimistic. Even so, we shall see that this optimistic 
CCS performance remains far below that of our BP-based JCED approach (which requires no genie-aided 
steps). 

We now describe several reference schemes, all of which use the DCED procedure described above, 
but with different channel estimators. The first uses traditional linear MMSE (LMMSE) estimation. Since 
LMMSE does not exploit channel sparsity, it yields a performance lower-bound for any sparsity-leveraging 
technique. We also consider MMSE-optimaj^ pilot-aided channel estimation under the support-aware 
genie (SG), reasoning that this yields a performance upper-bound for CCS. Finally, we consider MMSE- 
optimal estimation under a bit- and support-aware genie (BSG). Here, in addition to the channel support 
being known, all bits (including data bits) are known and used for channel estimation. This latter reference 
scheme yields a performance upper-bound for any implementable DCED or JCED scheme, including our 
BP-based JCED. Remarkably, we shall see that that performance of our proposed scheme is not far from 
that of the BSG. 

For all of our results, we used irregular LDPC codes with codeword length k, 10000 and average column 
weight 3, generated (and decoded) using the publicly available software |[35l . Random interleaving did 
not seem to have an effect, and so no interleaving was employed. For bit-to-symbol mapping, we used 
multilevel Gray-mapping |[36l . noting recent work |[37ll that conjectures the optimality of Gray-mapping 

^ The criterion employed by LASSO ||5] is equivalent to the one employed in "basis pursuit denoising" (6). 

' The performance of LASSO/SPGLl is highly dependent on the value of a tuning parameter that determines the tradeoff 
between the estimate's sparsity and the residual's variance. To optimize this tradeoff, for each realization, SPGLl was invoked 
over a dense grid of tuning parameters, and the one that minimized NMSE (with respect to the true channel) was chosen. 

When the sparse-channel support is known, the non-zero channel coefficients follow a Gaussian prior, and MMSE-optimal 
estimates can be calculated linearly. 
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Fig. 3. Channel estimation NMSE versus pilot-to-sparsity ratio Np/K, for SNR = 20dB, A/t = training bits, 7^ = 3 bpcu, and 
64-QAM. 



when BICM is used with a strong code. For OFDM, we use N = 1021 subcarriers, since prime N 
ensures that square/tall submatrices of $ will be full-rank. As described in the sequel, we tested various 
combinations of pilot subcarriers A'^p and interspersed training bits Mt. The Np pilot subcarriers were 
spaced uniformly and modulated with QAM symbols chosen uniformly at random. The Aft training bits 
were placed at the most significant bits (MSBs) of uniformly spaced data subcarriers with values chosen 
uniformly at random. 

Unless otherwise specified, we used length L = 256 channels with sparsity rate A = 1/4, yielding 
E{A'} = AA^ = 64 non-zero taps on average. All results are averaged over T=100 OFDM symbols. 

B. NMSE and BER versus the number of pilot subcarriers 

Figure [3] plots channel estimation normalized mean-squared error NMSE = \\x[t\ — x[t\\\^/\\x\t\\\2 
versus the pilot-to-sparsity ratio N^/K at SNR = 20dB. As expected, CCS's NMSE falls between that of 
LMMSE and SG estimators, and all three decrease monotonically with Np/K. Even after a single turbo 
iteration, BP-JCED significantly outperforms CCS, and — ^perhaps surprisingly — the SG (when Np/K> 3). 
The reason for this latter behavior is that, while the SG uses only the A'^p pilot subcarriers, BP-JCED 
uses all A^ subcarriers, which yields improved performance even though the N^ = N — Np data symbols 

' ' Experiments with non-prime A*' — 1024 showed a slight degradation of performance. 
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Fig. 4. BER versus pilot ratio Np/K, for SNR = 20dB, Aft = training bits, r? = 3 bpcu, and 64-QAM. 

are known with very little certainty during the first turbo iteration. Figure [3] indicates that, after only 
2 turbo iterations, BP-JCED learns the data symbols well enough to estimate the channel nearly as 
well as the BSG (which knows the data symbols perfectly). The fact that BP-JCED can generate channel 
estimates that are nearly as good as BSG's support-aware estimates attests to the near-optimal compressive 
estimation abilities of RBP 

Figure m plots bit error rate (BER) versus the pilot ratio Np/K at SNR = 20dB and a fixed spectral 
efficiency of = 3 bpcu. The curves exhibit a "notched" shape because, as A'^p increases, the code rate 
R must decrease to maintain a fixed value of spectral efficiency rj. Thus, while an increase in A'^p can 
make channel estimation easier, the reduction in R makes data decoding more difficult. For CCS, Fig. |4] 
indicates that A'^p = 4K = L is optimal. The SG and BP-JCED curves show a similar notch-like shape, 
although their notches are much wider. Finally, the degredation of BP-JCED's data-bit estimates at large 
Np/K explains the degredation of its channel estimates, as seen in Fig. [3j since, with JCED, channel 
estimation is data-directed. 

It is interesting to notice that Fig. |4] shows the optimal CCS pilot insertion rate to be the "Nyquist" 
rate of A'^p = L, since, at this pilot rate, CCS is not actually "compressed." To further investigate this 
behavior, we repeated the experiment using a channel with half the number of active coefficients (i.e., 
A = E{i^}/L = 1/8) and report the results in Fig. [5] Remarkably, we find the same behavior: CCS again 
performs best when pilots are inserted at the Nyquist rate of A^p = L. In fact, we repeated this experiment 
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with dozens of other arbitrary combinations of (A^, L, A, SNR, rj, M), and always found exactly the same 
behavior. Our empirical evidence suggests that, generally speaking, decoupled sparse-channel estimation 
and data decoding works best when pilots are inserted at the Nyquist rate, at least for OFDM signaling 
under uniform subcarrier power allocation!!^ 

C. Outage rate and the importance of bit-level training 

Figure |6] plots 770.001 versus SNR, where r/o.ooi denotes the spectral efficiency (in bpcu) yielding 
BER = 0.001. The solid-line traces correspond to A'^p = AK = L pilots. Aft = training bits, and 64- 
QAM, as suggested by Fig. ID These solid-line traces all display the anticipated high-SNR scaling law 
{l-N^/N) log2(SNR) + 0(1), differing only in the 0(1) offset term. While, for this setup, we are glad 
to see BP-JCED performing on par with BSG, neither attains the desired channel-capacity prelog-factor 
of (1 — K/N) = lh/1Q. It turns out that this shortcoming is due to the choice (A'^pjMt) = {L,Q), which 
was chosen on behalf of CCS (and not BP-JCED). 

To find the optimal choice of (A'^p, Aft) for BP-JCED, we constructed the BER plot Fig. |7J There we 
see that BP-JCED performs best with (A'^p, Aft) = (0, MK), at least in the high-SNR regime. Note that the 

It would be interesting to see if this behavior persists when the pilot- versus data-subcarrier power allocation is optimized. 
Such an optimization, however, remains outside the scope of this manuscript. 
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Fig. 6. BER = 0.001-achieving spectral efficiency ryo.ooi versus SNR. The solid traces used Np/K — 4, Mt = 0, and 64-QAM, 
while the dashed trace used iVp = 0, Mt = M K, and 256-QAM. 




12 3 



Fig. 7. log]^g(BER) versus various combinations of pilot and training rate, for SNR = 20dB, 77 = 3.75 bpcu, and 256-QAM. 

total number of pilot/training bits used when (A'^p, Mt) = (0, MK) is equivalent to K degrees-of-freedom 
per fading block, consistent with the channel-capacity prelog factor. We then evaluated the outage rate of 
this scheme (with 256-QAM), obtaining the dashed r/o.ooi-vs-SNR trace in Fig. [6l which — ^remarkably — 
exhibits the desired prelog-factor of (1 — K/N). 
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Fig. 8. BER versus Eb/No{=SNR/-n), for Np/K = 4, Mt = 0, 77 = 0.5 bpcu, and 4-QAM. 

Figure [8] plots BER versus Eb/No = SUR/r] over a much lower range of SNR. As stated earlier, ex- 
periments confirmed that CCS favors (A'^p, M\) = (L, 0) in the low-SNR regime, and so this configuration 
was used to keep CCS competitive, while being potentially suboptimal for BP-JCED. Still, we see from 
Fig. [8] that BP-JCED, after only two turbo iterations, beats CCS by 1.8 dB and remains only 0.8 dB 
away from the BSG. 



In this work, we presented a novel approach to joint channel estimation and decoding (JCED) for 
spectrally efficient communication over channels with possibly sparse impulse responses. For this, we 
assumed a pilot-aided transmission scheme that combines bit interleaved coded modulation (BICM) with 
orthogonal frequency division multiplexing (OFDM). Our JCED scheme is based on belief propagation 
(BP) over a loopy factor graph, where our BP implementation uses very efficient approximations of the 
sum-product algorithm recently proposed under the guise of relaxed belief propagation (RBP) HI, |[38l 
and soft-input soft-output decoding. Because our JCED scheme requires only ^ 5NL multiplications 
per RBP iteration, we can handle long impulse responses, large numbers of OFDM subcarriers, and 
large constellations. Numerical experiments conducted using N = 1021 subcarriers, up to 256-point 
QAM constellations, 10000-bit LDPC codes, and channels with length L = 256 and average sparsity 
E{i('} = 64, showed that the BER of BP-JCED is close to genie-aided bounds and much better than 



V. Conclusion 
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the BER of the LASSO-based "compressed channel sensing" (CCS) approach, where sparse channel 
estimation is decoupled from data decoding. Moreover, the outage rates observed for BP-JCED exhibit 
the sparse-channel capacity pre-log factor (l — K/N), which is impossible to reach using CCS. 

Appendix A 
Derivation of RBP Quantities Fout,i and -Eout.i 

In this appendix, we derive the RBP quantities Fou\,i{y , z , fi^) and Souuiu , z , fi^) given in (fT4l)-(fT8]). 
From (D1)-(D2), we have that 

i^ouM(y,^,/i^) = / ^PY,\zM^)CJ^iz;z,fi'), (28) 
pyAv) Jz 

where PyXu) — J^PY,\z,{y\z)CJ^{z; z, fi^). From ( [T3l ). we rewrite PYi\Zi(.y\^) 

PY.\zM^) = i:^CAA(.;-|^,-^), (29) 

A;=l ' 

so that 

2" o{k) 



J zpy^^zM^)C^f{z;z,fi^) = l^zCM(z;^,-J^)cAf{z;z,fin (30) 

k=l ^ 

Py^iy) = 11$^ Icm{z;^,-J^)cM{z;z,^^). (31) 



k=l 

Using the property that 

CJ\f{x-e,fi^)CM{x;4>,f^''') = CJ\f(x; ^/f.^ti^l ^ 



we can rewrite 

^Py,|z,(2/k)CA/'(z;i,/x"^) 



CA/'(0;e'-(^,/+/), (32) 



E^C-V(o;f /.CAr(.; -;.;: V' -W7T) 03) 



S^'^l V s \s 

Jfc)|2„2 



/ I (fc) |2 ^ \ 

E A<"CAA(!«; ,<')i. |,(') I V' + ( ( - i) plijJ^ «) (35) 
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and, using the same procedure, we get 



k=l 



Finally, with Q '{y, z,fi^) defined in (fTTl ). equations (|28] ) and (l35l) and (l36l) combine to give 

2" 

i^oui,i(y,^,/i") = J]^f^(?/,5,/i^)(e(^)(y,^,/x^) + i), (37) 



/c=l 



from which (fT4b follows immediately. 
From (Dl) and (D3), we have that 



(^OUliKy, Z, fJ- ) — -r^ . {JO) 

PYAy) 



Similar to ( 1331 ). we can write 



\z - Fout,ipPy.|z.(yk)CA/'(z;i,/x^) 



fc=l 



(39) 



Then, using the change-of-variable z = z — -Fout,i> and absorbing the s^^'^ terms as we did in (l35l) . we get 

k - ^ouMpPy.|z.(yk)CA/'(z;i,/Li'') 



9 A/ 



fc=i ■'^ " ^ 



PL"!!" 

OUt.ii 



2 



(40) 



= Eft'^'C-^l!";'""^. N'-'IV +M")(|f " - .=.1= + ^J . (41) 

fc = l 

Finally, using f , /x^) defined in (fTTl ). equations (l36l ) and (l38l ) and (|4T]) combine to give the 

expression for £oux,i{y , z , given in ( fTSl) . 

Appendix B 
Derivation of RBP Quantities i^nj and £\r,j 

In this appendix, we derive the RBP quantities F\„ j{q,n'^) and £\nj{q,fi'^) given in (fT9l)-(|23]). 
From (D4)-(D6), we note that F\nj{q,fi'^) and £inj{q,fi'^) are the mean and variance, respectively, of 
the pdf 

1 



Pxh)CM{q;q,fi''), (42) 
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where Zj = JgPXj{Q)CM{q;q, ij,'^). Using (l32l ) together with the definition of px^i-) from (01), we find 
that 

px,(g)CAA(g;g,/i'') 

= XjCM{q; 0, /X,) CAA((?; g, /x^) + (1 - X,)S{q) CM{q; q, fi") (43) 
= A,CA/-(0; -q, fi, + CAA(g; ^/^^'^^y^, ^ ^ ^^^, ) + (1 " A,)CAA(0; g", (44) 

= A,CA/-(g- 0, + ^'') CA/-(g; ^|^) + (1 " A,)CAA(g; 0, /.^)<5(g), (45) 

which implies that 

Zj = XjCAf{q; 0, fij + + (1 - Aj)CAA(g; 0, (46) 

Thus, the mean obeys 

i^nj(9,/u^) = ^ / gpx,((?)CAA(g;g^/i'?) (47) 

= ^cm;0,H + ^n^,^f^ (48) 

= =jj{q,fii) 
Expression ( fT9l ) then follows directly from (|48] ). 

Since, for the pdf in (l42l ). is the mean and fj^j is the variance, we can write 



(49) 



Jq 

= ^ / kl'px,('7)CAA(g;g,^'?) - li^injf (50) 

Jq 

+ (1 - Xj)CM{q;0,fi'^)d{q)) - |i^njf (51) 



\ ^1 ^1 + fij 111 + y 
^(l7.P + -.)-Al7.P (53) 



Expression (1201) then follows by rearranging (I53I ). 
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